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Abstract. Though Shannon entropy of a probabihty measure P, defined as 
— ^ In ^ d/i on a measure space {X, 971, /i) , does not qualify itself as an 
information measure (it is not a natural extension of the discrete case), maximum 
entropy (ME) prescriptions in the measure-theoretic case are consistent with that of 
discrete case. In this paper, we study the measure-theoretic definitions of generalized 
information measures and discuss the ME prescriptions. We present two results in 
this regard: (i) we prove that, as in the case of classical relative-entropy, the measure- 
theoretic definitions of generalized relative-entropies, Renyi and Tsallis, are natural 
extensions of their respective discrete cases, (ii) we show that, ME prescriptions of 
measure-theoretic Tsallis entropy are consistent with the discrete case. 
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1. Introduction 

Shannon measure of information was developed essentially for the case when the random 
variable takes a finite number of values. However, in the literature, one often encounters 
an extension of Shannon entropy in the discrete case to the case of a one- dimensional 
random variable with density function p in the form (e.g PlEj) 



This entropy in the continuous case as a pure-mathematical formula (assuming 
convergence of the integral and absolute continuity of the density p with respect to 
Lebesgue measure) resembles Shannon entropy in the discrete case, but can not be used 
as a measure of information. First, it is not a natural extension of Shannon entropy 
in the discrete case, since it is not the limit of the sequence finite discrete entropies 
corresponding to pmf which approximate the pdf p. Second, it is not strictly positive. 

Inspite of these short comings, one can still use the continuous entropy functional in 
conjunction with the principle of maximum entropy where one wants to find a probability 
density function that has greater uncertainty than any other distribution satisfying a 
set of given constraints. Thus, in this use of continuous measure one is interested in it 
as a measure of relative uncertainty, and not of absolute uncertainty. This is where one 
can relate maximization of Shannon entropy to the minimization of Kullback-Leibler 
relative-entropy (see jSl pp. 55]). 

Indeed, during the early stages of development of information theory, the important 
paper by Gelfand, Kolmogorov and Yaglom |3] called attention to the case of defining 
entropy functional on an arbitrary measure space (X, 971, ^). In this respect. Shannon 
entropy of a probability density function p : X — > R+ can be written as. 



One can see from the above definition that the concept of "the entropy of a pdf" is a 
misnomer: there is always another measure ^ in the background. In the discrete case 
considered by Shannon, is the cardinality measure§ [1, pp. 19]; in the continuous case 
considered by both Shannon and Wiener, fj, is the Lebesgue measure cf. lH pp. 54] and 
13 pp. 61, 62]. All entropies are defined with respect to some measure n, as Shannon 
and Wiener both emphasized in |T} pp.57, 58] and pp.61, 62] respectively. 

This case was studied independently by Kallianpur [6^ and Pinsker [7] , and perhaps 
others were guided by the earlier work of Kullback , where one would define entropy in 
terms of Kullback-Leibler relative entropy. Unlike Shannon entropy, measure-theoretic 
definition of KL-entropy is a natural extension of definition in the discrete case. 

In this paper we present the measure-theoretic definitions of generalized information 
measures and show that as in the case of KL-entropy, the measure-theoretic definitions 

§ Counting or cardinality measure on a measurable space {X, 9Jl) , when is X is a finite set and 
«m = 2-^, is defined as fi{E) ^ VE e Tl. 
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of generalized relative-entropies, Renyi and Tsallis, are natural extensions of their 
respective discrete cases. We discuss the ME prescriptions for generalized entropies and 
show that ME prescriptions of measure-theoretic Tsallis entropy are consistent with the 
discrete case, which is true for measure-theoretic Shannon-entropy. 

Rigorous studies of the Shannon and KL entropy functionals in measure spaces can 
be found in the papers by Ochs P and by Masani (TUl HI] . Basic measure-theoretic 
aspects of classical information measures can be found in [71 1121 IT^ . 

We review the measure-theoretic formalisms for classical information measures 
in § 121 and extend these definitions to generalized information measures in § El In 
§ we present the ME prescription for Shannon entropy followed by prescriptions for 
Tsallis entropy in § jSl We revisit measure-theoretic definitions of generalized entropic 
functionals in § IHl and present some results. 



2. Measure-Theoretic definitions of Classical Information Measures 
2.1. Discrete to Continuous 

Let p : [a, b] be a probability density function, where [a, b] C M. That is, p 

satisfies 

f' 

p{x) > 0, Vx G [a, b] and / p{x) dx = 1 . 

J a 

In trying to define entropy in the continuous case, the expression of Shannon entropy 
was automatically extended by replacing the sum in the Shannon entropy discrete case 
by the corresponding integral. We obtain, in this way, Boltzmann's H-function (also 
known as differential entropy in information theory), 

S{p) = ~ p{x)\np(x) dx . (1) 

J a 

But the "continuous entropy" given by (P) is not a natural extension of definition 
in discrete case in the sense that, it is not the limit of the finite discrete entropies 
corresponding to a sequence of finer partitions of the interval [a, b] whose norms tend 
to zero. We can show this by a counter example. Consider a uniform probability 
distribution on the interval [a, b], having the probability density function 

p{x) = , X e [a.b] . 

The continuous entropy (PJ), in this case will be 
S{p) = \n{b - a) . 

On the other hand, let us consider a finite partition of the the interval [a, b] which is 
composed of n equal subintervals, and let us attach to this partition the finite discrete 
uniform probability distribution whose corresponding entropy will be, of course. 

Snip) = Inn . 
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Obviously, if n tends to infinity, the discrete entropy Sn{p) will tend to infinity too, 
and not to ln(6 — a); therefore S{p) is not the limit of Sn{p), when n tends to infinity. 
Further, one can observe that ln(6 — a) is negative when b — a < 1. 

Thus, strictly speaking continuous entropy (H)) cannot represent a measure of 
uncertainty since uncertainty should in general be positive. We are able to prove the 
"nice" properties only for the discrete entropy, therefore, it qualifies as a "good" measure 
of information (or uncertainty) supplied by an random experiment. The "continuous 
entropy" not being the limit of the discrete entropies, we cannot extend the so called 
nice properties to it. 

Also, in physical applications, the coordinate x in (^J) represents an abscissa, a 
distance from a fixed reference point. This distance x has the dimensions of length. Now, 
with the density function p{x), one can specify the probabilities of an event [c, d) C [a, b] 
as J^p{x)dx, one has to assign the dimensions (length)"^, since probabilities are 
dimensionless. Now for < 2; < 1, one has the series expansion 



it is necessary that the argument of the logarithm function in be dimensionless. 
Hence the formula (P) is then seen to be dimensionally incorrect, since the argument 
of the logarithm on its right hand side has the dimensions of a probability density jl4j . 
Although Shannon ^H] used the formula (pQ), he does note its lack of invariance with 
respect to changes in the coordinate system. 

In the context of maximum entropy principle Jaynes ^H] addressed this problem 
and suggested the formula. 



in the place of (0), where m{x) is a prior function. Note that when m(x) is probability 
density function, © is nothing but the relative-entropy. However, if we choose m{x) = c, 
a constant (e.g ^|), we get 



where S{p) refers to the continuous entropy (0). Thus, maximization of S'{p) is 
equivalent to maximization of S{p). Further discussion on estimation of probability 
density functions by ME-principle in the continuous case can be found in [iHl CZl HH] • 

Prior to that, Kullback [Hj too suggested that in the measure-theoretic definition of 
entropy, instead of examining the entropy corresponding to only on given measure, we 
have to compare the entropy inside a whole class of measures. 

2.2. Classical information measures 

Let {X, Tl, /i) be a measure space, /i need not be a probability measure unless otherwise 
specified. Symbols P, R will denote probability measures on measurable space {X, OJl) 
and p, r denote OJl-measurable functions on X. An 9Jt-measurable function p : X ^ 
is said to be a probability density function (pdf) if J^pdfi = 1. 



ln(l - z) = z + -z^ + -z^ + . . . 



(2) 




(3) 



S"(p) = S{j)) — Inc , 
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In this general setting, Shannon entropy S{p) of pdf p is defined as follows PU] . 

Definition 2.1. Let {X, dJl, /i) be a measure space and ^-measurable function p : X ^ 
be pdf. Shannon entropy of p is defined as 

'Sip) = — plnpdfj. , (4) 
Jx 

provided the integral on right exists. 

Entropy functional S{p) defined in (0} can be referred to as entropy of the 
probability measure P, in the sense that the measure P is induced by p, i.e., 

P{E) = I p{x) d/x(x) , G M . (5) 



This reference is consistent || because the probability measure P can be identified a.e by 
the pdf p. 

Further, the definition of the probability measure P in (jH)), allows us to write entropy 
functional (jH) as, 

/• dP dP , 

since (jS)) implies^ P <^ and pdf p is the Radon-Nikodym derivative of P w.r.t /i. 

Now we proceed to the definition of KuUback-Leibler relative-entropy or KL-entropy 
for probability measures. 

Definition 2.2. Let (X, OJt) be a measurable space. Let P and R be two probability 
measures on (X, 971). Kullback-Leibler relative- entropy KL-entropy of P relative to R is 
defined as 

[ In^dP z/ P«P , 

m\R) = { (7) 

-|-oo otherwise. 

The divergence inequality /(P||P) > and /(P||P) = if and only if P = P can 
be shown in this case too. KL-entropy (|7j) also can be written as 

Let the cr-finite measure on {X, 971) such that P <^ P <^ /i. Since /i is a-finite, 
from Radon-Nikodym theorem, there exists a non-negative 9Jl-measurable functions 
p : X ^ and r : X ^ M"*" unique fi-a.e, such that 

P(P) = [ pd/x , VP G 971 , (9) 
Je 

II Say p and r are two pdfs and P and R are corresponding induced measures on measurable space 
{X,^}Jt) such that P and R are identical, i.e., J^pdfi = f^rdfi, WE £ 371. Then we have p r and 
hence — p Inp d/i = — r In r d/i. 

^ If a nonnegative measurable function / induces a measure v on measurable space {X, $H) with respect 
to a measure /z, defined as — f "^M' ^ ^ then j/ <C /i. Converse is given by Radon-Nikodym 
theorem |2I| pp.36, Theorem 1.40(b)]. 
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and 

R{E) = /" rd/i , \/E em . (10) 
Je 

The pdfs p and r in (jH]) and (fTUjl (they are indeed pdfs) are Radon-Nikodym derivatives 
of probabihty measures P and R with respect to respectively, i.e., P = ^ and r = 
Now one can define relative-entropy of pdf p w.r.t r as follows"*". 

Definition 2.3. Let (X, 971, /i) &e a measure space. Let DJl-measurable functions 
p,r : X ^ M"*" be two pdfs. The KL-entropy of p relative to r is defined as 

I{p\\r)= f pix)\n^df^ix) , (11) 

provided the integral on right exists. 

As we have mentioned earlier, KL-entropy (fTT|) exist if the two densities are 
absolutely continuous with respect to one another. On the real line the same definition 
can be written as 

f v(x) 
I{p\\r) = / p{x)ln——-dx , 

which exist if the densities p{x) and r{x) share the same support. Here, in the sequel 
we use the convention 

lnO = -oo, In ^ = +00 forany a G M, 0.(±cx)) = 0. (12) 

Now we turn to the definition of entropy functional on a measure space. Entropy 
functional in ^ is defined for a probability measure that is induced by a pdf. By 
the Radon-Nikodym theorem, one can define Shannon entropy for any arbitrary /i- 
continuous probability measure as follows. 

Definition 2.4. Let {X, VJl, /i) be a a-finite measure space. Entropy of any fi-continuous 
probability measure P (P <^ ji) is defined as 

f dP 

S{P) = - \n — dP. (13) 
Jx d/i 

Properties of entropy of a probability measure in the Definition 12.41 are studied 
in detail by Ochs [H] under the name generalized Boltzmann-Gibbs-Shannon Entropy. 
In the literature, one can find notation of the form S{P\n) to represent the entropy 
functional in (|T!?jl viz., the entropy of a probability measure, to stress the role of the 
measure fi (e.g [Sll2n])- Since all the information measures we define are with respect 
to the measure /i on (X, OJl), we omit fj, in the entropy functional notation. 

By assuming as a probability measure in the Definition 12. 4[ one can relate 
Shannon entropy with Kullback-Leibler entropy as, 

S{P) = -/(Pll/i) . (14) 

+ This follows from the chain rule for Radon-Nikodym derivative: 

dP a^o dP /dR\ 

dR ' d^l\d^l) 
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Note that when /i is not a probabihty measure, the divergence inequahty /(P||/i) > 
need not be satisfied. 

A note on the cr-finiteness of measure ^. In the definition of entropy functional we 
assumed that /i is a cr-finite measure. This condition was used by Ochs P, Csiszar |22] 
and Rosenblatt-Roth fl^ to tailor the measure-theoretic definitions. For all practical 
purposes and for most applications, this assumption is satisfied. (See [H] for a discussion 
on the physical interpretation of measurable space {X, 9Jt) with a-finite measure /i for 
entropy measure of the form and of the relaxation cr-finiteness condition.) By 

relaxing this condition, more universal definitions of entropy functionals are studied by 
Masani [Elin]. 

2.3. Interpretation of Discrete and Continuous Entropies in terms of KL- entropy 

First, let us consider discrete case of {X, m, n), where X = {xi, . . . , x„}, 971 = 2^ and 
/i is a cardinality probability measure. Let P be any probabihty measure on (X, OJl). 
Then /i and P can be specified as follows. 

n 

/i: /ifc = /i({a;fc}) > 0, A; = 1, . . . ^ /Xfc = 1 , 

k=l 

and 

n 

P: Pfe = P({xfc}) > 0, fc = l,...,n, ^Pfc = l . 



fc=i 



The probability measure P is absolutely continuous with respect to the probability 
measure /i if /i^ = implies P^ = for any k = 1, . . .n. The corresponding Radon- 
Nikodym derivative of P with respect to fi is given by 

d/i fJ-k 

The measure-theoretic entropy S{P) ()13p. in this case, can be written as 

n p n n 

5(P) = -VPfcln— = VPfcln/ifc- VPfclnPfc . 

Ui. ^-^ ^ — ^ 

fc=l k=l k=l 

If we take referential probability measure as a uniform probability distribution on the 
set X, i.e. //^ = ^, we obtain 

S{P) = S^{P)-\nn , (15) 

where S'n(P) denotes the Shannon entropy of pmf P = (Pi, . . . , P„) and 5'(P) denotes 
the measure-theoretic entropy in the discrete case. 

Now, lets consider the continuous case of (X, 971, /i), where X = [a,b] C M, 9JI is 
set of Lebesgue measurable sets of [a, b] , and is the Lebesgue probability measure. In 
this case /j, and P can be specified as follows. 

/i: fi{x) >0,x G [a,b],3 fi{E) = / fi{x) dx,\fE G Tl, / fi{x)dx = l , 
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and 

P: P{x) >0,x e[a,b],3 P{E) = [ P{x) dx,\/E e m, [ P{x)dx = l . 

Je J a 

Note the abuse of notation in the above specification of probabihty measures and F, 
where we have used the same symbols for both measures and pdfs. 

The probabihty measure P is absolutely continuous with respect to the probability 
measure /x, if ii{x) = on a set of a positive Lebesgue measure implies that P{x) = on 
the same set. The Radon-Nikodym derivative of the probability measure P with respect 
to the probability measure ^ will be 

dP, . P{x) 
[x] 



d/i 

Then the measure-theoretic entropy S{P) in this case can be written as 

fb 



S(P) = - I p(x)\n-^dx . 

Ja m 



If we take referential probability measure /u as a uniform distribution, i.e. fi{x) = 
X G [a, b], then we obtain 

SiP) = S[a,b]iP)-Hb-a) , 

where S[a,b]{P) denotes the Shannon entropy of pdf P{x), x G [a, b] ^ and S{P) denotes 
the measure-theoretic entropy in the continuous case. 

Hence, one can conclude that measure theoretic entropy S{P) defined for a 
probability measure P on the measure space {X,Ai, //), is equal to both Shannon entropy 
in the discrete and continuous case case up to an additive constant, when the reference 
measure fi is chosen as a uniform probability distribution. On the other hand, one can 
see that measure-theoretic KL-entropy, in discrete and continuous cases are equal to its 
discrete and continuous definitions. 

Further, from (|14j) and (jl5|) . we can write Shannon Entropy in terms Kullback- 
Leibler relative entropy 

Sn{P) = \nn-I{P\\fi) . (16) 

Thus, Shannon entropy appearers as being (up to an additive constant) the variation 
of information when we pass from the initial uniform probability distribution to new 
probability distribution given by Pk > 0, X]fc=i -^fc = 1, as any such probability 
distribution is obviously absolutely continuous with respect to the uniform discrete 
probability distribution. Similarly, by ()14|) and ()2.3|) the relation between Shannon 
entropy and Relative entropy in discrete case we can write Boltzmann H-function in 
terms of Relative entropy as 

S[a,b]{p) = Hb~a)-IiPy) . (17) 

Therefore, the continuous entropy or Boltzmann H-function S{p) may be interpreted 
as being (up to an additive constant) the variation of information when we pass from 
the initial uniform probability distribution on the interval [a, b] to the new probability 
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measure defined by the probability distribution function p{x) (any such probabihty 
measure is absolutely continuous with respect to the uniform probability distribution 
on the interval [a, b] ). 

Thus, KL-entropy equips one with unitary interpretation of both discrete entropy 
and continuous entropy. One can utilize Shannon entropy in the continuous case, as 
well as Shannon entropy in the discrete case, both being interpreted as the variation 
of information when we pass from the initial uniform distribution to the corresponding 
probability measure. 

Also, since measure theoretic entropy is equal to the discrete and continuous entropy 
upto an additive constant, ME prescriptions of measure-theoretic Shannon entropy are 
consistent with discrete case and the continuous case. 

3. Measure-Theoretic Definitions of Generalized Information Measures 

We begin with a brief note on the notation and assumptions used. We define all the 
information measures on the measurable space {X, Wt) , and default reference measure 
is fj, unless otherwise stated. To avoid clumsy formulations, we will not distinguish 
between functions differing on a yU-null set only; nevertheless, we can work with equations 
between 971- measurable functions on X if they are stated as valid as being only /i-almost 
everywhere (yU-a.e or a.c). Further we assume that all the quantities of interest exist 
and assume, implicitly, the a-finiteness of jj, and //-continuity of probability measures 
whenever required. Since these assumptions repeatedly occur in various definitions 
and formulations, these will not be mentioned in the sequel. With these assumptions 
we do not distinguish between an information measure of pdf p and of corresponding 
probability measure P - hence we give definitions of information measures for pdfs, we 
use corresponding definitions of probability measures as well, when ever it is convenient 
or required - with the understanding that P{E) = f^pdfj,, the converse being due to 
the Radon-Nikodym theorem, where p = In both the cases we have P <^ ji. 

First we consider the Renyi generalizations. Measure-theoretic definition of Renyi 
entropy can be given as follows. 

Definition 3.1. Renyi entropy of a pdf p : X M+ on a measure space (X, 9Jt, /x) is 



defined as 




(18) 



provided the integral on the right exists and a e a > 0. 



The same can be defined for any /i-continuous probability measure P as 




(19) 



On the other hand, Renyi relative-entropy can be defined as follows. 
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Definition 3.2. Let p,r : X ^ M"*" be two pdfs on measure space {X, Wl, /i). The Renyi 
relative- entropy of p relative to r is defined as 

IM\r)^^^>.l^M^) . (20) 

provided the integral on the right exists and a E M., a > 0. 

The same can be written in terms of probability measures as, 

1 , f fdP^ 



^ In f (^ ] dR , (21) 



a — 1 J X \d-R^ 

whenever P <^ R; Ia{P\\R) = +oo, otherwise. Further if we assume /i in (fT^ is a 
probabihty measure then 

Sa{P) = . (22) 

Tsallis entropy in the measure theoretic setting can be defined as follows. 

Definition 3.3. Tsallis entropy of a pdf p on {X,DJl,fi) is defined as 

Sq{.p)= / p{x)\nq—-d^x[x) = ^ , (23) 

Jx P{x) q - 1 

provided the integral on the right exists and g G M and q > 0. 

hiq in ()23p is referred to as g-logarithm and is defined as In^ x = ^ ^ _~ ^ {x > 
0, g G R). The same can be defined for /i-continuous probability measure P, and can be 
written as 

S,iP) = ljn,(^) \p . (24) 



d/i / 

The definition of Tsallis relative-entropy is given below. 

Definition 3.4. Let {X, DJl, fi) be a measure space. Let p,r : X ^ R"*" be two probability 
density functions. The Tsallis relative- entropy of p relative to r is defined as 



Iq{P r) = - / p{x) lug — - dfi{x) = '-^ 25 

Ix P{x) q-l 



provided the integral on right exists and g G R and q > 0. 

The same can be written for two probability measures P and R, as 



X 



dP^ ' 



I,{P\\R) = - / In, ( — ) dP , (26) 



whenever P <^ R; /y(P||i?) = +oo, otherwise. If n in is a probability measure then 
S,iP) = I.iPWfi) . (27) 
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4. Maximum Entropy and Canonical Distributions 

For all the ME prescriptions of classical information measures we consider set of 
constrains of the form 



UmdP = / Um{x)p{x) dfiix) = {Um) , m = 1, . . . , M , (28) 
X Jx 

with respect to 9Jl-measurable functions Um : X ^ R, m = 1,...M, whose 
expectation values (ttm)? m = 1, . . . M are (assumed to be) a priori known, along with 
the normalizing constraint dP = 1. (From now on we assume that any set of 
constraints on probability distributions implicitly includes this constraint, which will 
not be mentioned in the sequel.) 

To maximize the entropy with respect to the constraints (j2Hl), the solution is 
calculated via the Lagrangian: 



f \n^(x)dP(x)-x( f dPh 
Jx dfi \Jx 



C{x,X,(3) = ~ I In— (x)dP(x) - A ( / dP{x) - 1 

Um{x)dP{x)-{Um)^ , (29) 

where A and (3mm = 1,...,M are Lagrange parameters (we use the notation 
(3 = {Pi, ... , (3m))- The solution is given by 

dP 

In -T-{x) + + X] l^rnUm{x) = . 
" m=l 

The solution can be calculated as 

dP{x, P) = exp [ - In Z{(3) - ^ (3mUm{x)] dfi{x) (30) 



m=l 



or 



^^'^ = d^i^") = — m — ' ^''^ 

where the partition function Z{(3) is written as 

Z{(3) = exp ^- (3mUm{x)^ d/i(x) . (32) 

The Lagrange parameters (3m, rn = 1, . . . M are specified by the set of constraints (j2Hl)- 
The maximum entropy, denoted by S, can be calculated as 

M 

S = \nZ+Y,f3^{'^m) . (33) 

m=l 

The Lagrange parameters (3m, m = 1, . . . M, are calculated by searching the unique 
solution (if it exists) of the following system of nonlinear equations: 

■j^\nZ{P) = -{um) , m = l,...M . (34) 
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We also have 

dS 



-Pm , m = l,...M . (35) 



d{Um) 

Equations and ()34|) are referred to as the thermodynamic equations. 



5. ME prescription for Tsallis Entropy 

The great success of Tsallis entropy is attributed to the power-law distributions one can 
derive as maximum entropy distributions by maximizing Tsallis entropy with respect to 
the moment constraints. But there are subtilities involved in the choice of constraints 
one would choose for ME prescriptions of these entropy functionals. These subtilities 
are still part of the major discussion in the nonextensive formalism I25( OB]. 

In the nonextensive formalism maximum entropy distributions are derived with 
respect to the constraints which are different from (|28|). which are used for classical 
information measures. The constraints of the form (j^H|) are inadequate for handling the 
serious mathematical difficulties (see [27j)- To handle these difficulties constraints of 
the form 

Umix)p(xY du(x) ,, , , 

J.pwldU) ° '"'^•■■■■^^ (36) 

are proposed. can be considered as the expectation with respect to the modified 
probability measure P(g) (it is indeed a probability measure) defined as 

P(,)(E) = (^l^p{xydf?j j^pixydfi . (37) 

The measure P(g) is known as escort probability measure. 

The variational principle for Tsallis entropy maximization with respect to 
constraints can be written as 

C{x,X,(3)= [ lug ^ dP(x) - A ( / dP(x)-l) 
Jx P{^) \Jx J 

A/ / „ X 

(y^M^)'^"' {^u^) - ii^m)),) dP{x)j , (38) 



m=l 



where the parameters Pm can be defined in terms of true Lagrange parameters (3^ as 



X 



-1 



P^r^^={ pixYdfi] ,m = l,...,M. (39) 



The maximum entropy distribution in this case can be written as 



p{x) 



1 - (1 - g) ( / dxp{xf j ^ Prn [uJyX) - 

^ m=l 



1 

1-9 



(40) 



p{x) = ^ = , (41) 
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where 



X 



Maximum Tsallis entropy in this case satisfies 

Sq = \nqZ'q, (43) 

while corresponding thermodynamic equations can be written as 
d 

\nqZq = -{{um))„ , m=l,...M, (44) 



' --(3^ , m = l,...M , (45) 



d{{Um))q 

where 

M 



lUg Zg = lUg Zq-^ Pm{{Um))q ■ (46) 
m=l 

6. Measure-Theoretic Definitions: Revisited 

It is well known that unlike Shannon entropy, Kullback-Leibler relative-entropy in the 
discrete case can be extended naturally to the measure-theoretic case. In this section, we 
show that this fact is true for generalized relative-entropies too. Renyi relative-entropy 
on continuous valued space M and its equivalence with the discrete case is studied by 
Renyi [2H] . Here, we present the result in the measure-theoretic case and conclude that 
both measure-theoretic definitions of Tsallis and Renyi relative-entropies are equivalent 
to its discrete case. 

We also present a result pertaining to ME of measure-theoretic Tsallis entropy. We 
show that ME of Tsallis entropy in the measure-theoretic case is consistent with the 
discrete case. 

6.1. On Measure- Theoretic Definitions of Generalized Relative- Entropies 

Here we show that generalized relative-entropies in the discrete case can be naturally 
extended to measure-theoretic case, in the sense that measure-theoretic definitions can 
be defined as a limit of a sequence of finite discrete entropies of pmfs which approximate 
the pdfs involved. We call this sequence of pmfs as "approximating sequence of pmfs of 
a pdf" . To formalize these aspects we need the following lemma. 

Lemma 6.1. Let p be a pdf defined on measure space (X, 971, /i). Then there exists a 
sequence of simple functions {/n} (we- refer to them as approximating sequence of simple 
functions of p) such that lim„^oo fn=P (m-d each /„ can be written as 

fnix) = — ^ — - / pdfi , Vx G En,k, k = l,... m{n) , (47) 
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where (£"„.!, . . . , En^m{n)) is the measurable partition corresponding to fn (the notation 
m{n) indicates that m varies with n). Further each fn satisfies 

(48) 



X 



/„ d// = 1 . 

Proof. Define a sequence of simple functions {fn} as 



MP-([^,W)) 



P ( [ 2 " ' 2" ) ) 



fn{x) = < 



pdfi , if ^ <pix) < 

A; = 0,l,...n2"-1 



(49) 



pdn , 



if r?, < p{x), 



Each /„ is indeed a simple function and can be written as 

n2"-l 



k=0 



k Je„ 



n J F, 



(50) 



where En,k = P ^ = 0, . . . , n2" - 1 and F„ = p ^{[n,oo)). Since 

J^pdfi < oo for any E G 071, we have ^P^t^ ~ ^ whenever ^En,k = 0, for 
k = 0,...n2"' — 1. Similarly jp pdfi = whenever fiFn = 0. Now we show that 
lim„^oo fn = P, point-wise. 

First assume that p{x) < oo. Then 3n & Z+ 3 p{x) < n. Also 3 A; G Z+, 
< < n2"-l 9 A < p(x) < and I, < /„(x) < ^. This implies < \p-fn\ < 
as required. 

If p{x) = oo, for some a; G X, then x E Fn for all n, and therefore /„(x) > n for all 
n; hence lim„^oo fn{x) = oo = p{x). 
Finally we have 



n(m) 

/ /n d/i = V 

fc=l 



n(m) 

E 

k=l 


' 1 


^{En,k) Je 


n(ni) 






1 pdfi 


k=l 


En,k 


/ pd\i = \ 
Jx 



pdfi 



fJ'{En,k) 



□ 

The above construction of a sequence of simple functions which approximate a 
measurable function is similar to the approximation theorem |^ PP-6, Theorem 1.8(b)] 
in the theory of integration. But, approximation in Lemma Ifj.ll can be seen as a 
mean-value approximation where as in the later case it is the lower approximation. 
Further, unlike in the case of lower approximation, the sequence of simple functions 
which approximate p in Lemma 16.11 are neither monotone nor satisfy < p. 
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Now one can define a sequence of pmfs {pn} corresponding to tlie sequence of simple 
functions constructed in Lemma lUTTl denoted by pn = {Pn,i, ■ ■ ■ ,Pn,m(n)), as 

Pn,k = KEn,k)fnXE„,k = p dfi , k = 1 , . . . m{n) , (51) 

for any n. We have 

m{n) m{n) 

P^^k ^ pdfi = l , (52) 

k = l k = l •^En,k JX 

and hence p„ is indeed a pmf. We call {pn} as the approximating sequence of pmfs of 
pdf p. 

Now we present our main theorem, where we assume that p and r are bounded. 
The assumption of boundedness of p and r simplifies the proof. However, the result can 
be extended to an unbounded case. See [21] analysis of Shannon entropy and relative 
entropy on M. 

Theorem 6.2. Let p and r be pdf, which are bounded, defined on a measure space 
{X,Dyt,fi). Let Pn and rn be the approximating sequence of pmfs of p and r respectively. 
Let la denotes the Renyi relative- entropy as in \2(^) and Ig denote the Tsallis relative- 
entropy as in then 

lim Ia{Pn\\f„) = Ia{p\\r) (53) 

n^oo 

and 

lim IgiPnWrn) = Igip\\r) (54) 

n^oo 

Proof. It is enough to prove the result for either Tsallis or Renyi since each are monotone 
and continuous functions of each other. Hence we write down the proof for the case of 
Renyi and we use the entropic index a in the proof. 

Corresponding to pdf p, let {/„} be the approximating sequence of simple functions 
such that lim„^oo /n = P as in Lemma 16.11 Let {gn be the approximating sequence of 
simple functions for r such that lim^^oo dn = ^- Corresponding to simple functions 
and gn there exists a common measurable partition* {-En,!, • • • -E'ri,m(n)} such that fn and 
gn can be written as 

m(n) 

fn{x) = Y{an,k)XE,,^k{^) , a„,fc G IR+, VA; = 1, . . .m(n) , (55) 
fc=i 

m{n) 

gn{x) = 'Y{bn,k)XE„,k{x) , 6„,fe G M+, VA; = 1, . . . m(n) , (56) 

k=l 

* Let ip and (j> are two simple functions defined on {X,Ti}. Let {Ei,...En} and {Fi,...,Fm} 
be the measurable partitions corresponding to tp and (f) respectively. Then partition defined as 
{Ei n Ej\i = 1, . . .n, j = 1, . . . m} is a common measurable partition for both ip and 0. 
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where XE„k characteristic function of En^^, for k = l,...m(n). By (f^ 

and (j^Bjl the approximating sequences of pmfs {pn = {Pn,i, ■ ■ ■ ,Pn,m{n))} and {f„ = 
(f„^i, . . . , fn,m{n))} ^au bc Written as (see (jHTjl ) 

Pn,fc = a„,fc/i(E„,fe) A; = 1, . . . , , (57) 

^„,fc = bn,kfJ'{En±) k = 1,..., m{n) . (58) 
Now Renyi relative entropy for pn and f„ can be written as 

^ m(n) ^Q, 

'5a(F«ll^n) = ^jl^X^ ■7^KEn,k) ■ (59) 
« ^ fc=l "n,k 

To prove hm„^oo 'S'a(pn||?^n) = Sa{p\\r) it is enough to prove that 

lim In / dMo:) = In / dM^) , (60) 

« - J- 5-^(3;) a - J- Jx r{x) 



n— >oo 



since we havejj 

/n(a^) 

- ■ - ■ ■ ~ J I 

n.k ) 



(61) 



Further it is enough to prove that 



hm / hn{xy gn{x) d^{x) = [ d/i(x) , (62) 

~" r[xj 



X 

where /i„ is defined as hn(x) = 
Case i; < a < 1 



In this case the Lebesgue dominated convergence theorem pp.26] gives that, 

/^d;.= /-^d^. (63) 

X gn Jx ^ 



n— >oo 



and hence (jKljl 
Case 2: a> 1 

tt Since simple functions {Jn)" and [gn)'^~^ can be written as 

m{n) 

{fnTix) = X! K,fc) XB„,*=(a;) , and 

k=l 
m{n) 



fe=l 

Further, 



5" fc^i VO„,fc / 



We have h^fn g{i)^-^ Fatou's Lemma f^, pp.23] we obtain that, 

f f jofx)*^ 

lim inf / Kix)" gn{x) d^{x) > / d^(x) . 

"^'^ Jx r{x) 

From the construction of fn and (Lemma Ifi.ip we have 

K{x)fn{x) = / ^y^r(x)d/i , Vx G ^„,i . 

By Jensen's inequahty we get 

By (jHKjl and we can write as 

J^^{En,^)< / , Vz = l,...m(n) . 

By taking summations both sides of (fHTjl we get 

J2 J^f'iEn,i) <J2 f(r\<--i ' = 1, . . . m(n) . 
The above equation (jUSj) nothing but 



and hence 



X Jx 



sup / hf{x)fi{x) iJ,{x) < 

i>n J X 



rix) 



f J){ X]^ 

J J, r(x)"-i 



Finally we have 



r r p(x)" 

lim sup / /i^(x)/„(x)/i(x) < / , , _^ d/x 



From ()f)4p and (jH^ll we have 



lim / — ^— — - = / — - — - d/i , 



and hence (j^ . 



18 



6.2. On ME of Measure- Theoretic definition of Tsallis entropy 

With the shortcomings of Shannon entropy that it cannot be naturally extended to 
the non-discrete case, we have observed that Shannon entropy in its general case on 
measure space can be used consistently for the ME-prescriptions. One can easily see 
that generalized information measures of Renyi and Tsallis too cannot be extended 
naturally to measure-theoretic case, i.e., measure-theoretic definitions are not equivalent 
to the discrete case in the sense that they can not be defined as a limit of sequence of 
finite discrete entropies corresponding to pmfs defined on measurable partitions which 
approximates the pdf. One can use the same counter example we discussed in § 12.11 
We have already given the ME-prescriptions of Tsallis entropy in the measure-theoretic 
case. In this section, we show that the ME-prescriptions in the measure-theoretic case 
are consistent with the discrete case. 

Proceeding as in the case of measure-theoretic entropy in § 12. 3[ measure-theoretic 
Tsallis entropy Sq{P) (j2l|l in the discrete case can be written as 

n 

s,{P) = Y.P^^'''^T ■ ^^^^ 
k=i ^ 

By (??) we get 

n n 

s,{p) = J2 n K - H Pk] = s-{p) + Pk K , (72) 

k=l k=l 

where Sg{P) is the Tsallis entropy in discrete case. When /z is a uniform distribution 
i.e., l^k = \ Vn = 1, . . . n we get 

n 

S,{P) = S-{P) - n^-' K nY^P'k ■ (73) 

k=l 

Now we show that the quantity 'Y^=iPk constant in maximization of Sq{P) with 
respect to the set of constraints 
The claim is that 

p{xY Mx) = (Zqf-' , (74) 

which holds for Tsallis maximum entropy distribution (pn|) in general. This can be 
shown as follows. From the maximum entropy distribution pUj) . we have 

l-(l-g)( / p{xY d^{x)j Y Pm (um{x) - {{Um)), 

pixy-^ = ^ 



which can be rearranged as 

X]m=l f^i'n ( ^m(^ ) - {{Um))_ 

[Zqy-^p{x)= i-(i-g) y-—q ^d^(x) 



p{x) 
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By integrating both sides in the above equation, and by using we get (f7i|) . 
Now, (f7^ can be written in its discrete form as 

E^ = (^)"'- (75) 

k=l 

When n is uniform distribution we get 

n 

Y.P'u=n'-\Z;t' (76) 

k=l 

which is a constant. 

Hence by (|7Hjl and (fTHjl . on can conclude that with respect to a particular instance 
of ME, measure-theoretic Tsallis entropy S{P) defined for a probability measure P 
on the measure space (X, 971, /i), is equal to discrete Tsallis entropy up to an additive 
constant, when the reference measure // is chosen as a uniform probability distribution. 
There by, one can further conclude that with respect to a particular instance of ME of 
measure-theoretic Tsallis entropy is consistent with its discrete definition. 

7. Conclusions 

In this paper we presented measure-theoretic definitions of generalized information 
measures. We proved that the measure-theoretic definitions of generalized relative- 
entropies, Renyi and Tsallis, are natural extensions of their respective discrete cases. We 
also showed that, ME prescriptions of measure-theoretic Tsallis entropy are consistent 
with the discrete case. 
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